Revisiting bounded context block-sorting transformations
نویسندگان
چکیده
The Burrows-Wheeler Transform (bwt) produces a permutation of a string X, denoted X∗, by sorting the n cyclic rotations of X into full lexicographical order, and taking the last column of the resulting n× n matrix to be X∗. The transformation is reversible in O(n) time. In this paper, we consider an alteration to the process, called k-bwt, where rotations are only sorted to a depth k. We propose new approaches to the forward and reverse transform, and show the methods are efficient in practice. More than a decade ago, two algorithms were independently discovered for reversing k-bwt, both of which run in O(nk) time. Two recent algorithms have lowered the bounds for the reverse transformation to O(n log k) and O(n) respectively. We examine the practical performance for these reversal algorithms. We find the original O(nk) approach is most efficient in practice, and investigate new approaches, aimed at further speeding reversal, which store precomputed context boundaries in the compressed file. By explicitly encoding the context boundaries, we present an O(n) reversal technique that is both efficient and effective. Finally, our study elucidates an inherently cache-friendly – and hitherto unobserved – behaviour in the reverse k-bwt, which could lead to new applications of the k-bwt transform. In contrast to previous empirical studies, we show the partial transform can be reversed significantly faster than the full transform, without significantly affecting compression effectiveness. Copyright c © 0000 John Wiley & Sons, Ltd.
منابع مشابه
On variants of block-sorting compression using context from both the left and right
The block-sorting text compression algorithm can be viewed as associating a context with each character to be compressed, and then sorting the characters on their contexts. Normally, the context associated with each character is the string to the left of the character. Recently, Ratushnyak suggested that it might be possible instead to build a context by interleaving characters taken alternatel...
متن کاملOn variants of block-sorting compression using context from both the
The block-sorting text compression algorithm can be viewed as associating a context with each character to be compressed, and then sorting the characters on their contexts. Normally, the context associated with each character is the string to the left of the character. Recently, Ratushnyak suggested that it might be possible instead to build a context by interleaving characters taken alternatel...
متن کاملLossless and Near-Lossless Compression of Ecg Signals with Block-Sorting Techniques
In this work, we investigate the lossless and near-lossless compression of electrocardiogram (ECG) signals with different block-sorting transformations. We show that transformations with smaller context depths are a better choice for ECG signal compression when speed and memory utilization are considered. Further, we show that compression results of our proposed technique is better than other w...
متن کاملEnhanced Word-Based Block-Sorting Text Compression
The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing se...
متن کاملText Compression using Recency Rank with Context and Relation to Context Sorting, Block Sorting and PPM*
Recently block sorting compression scheme was developed and relation to statistical scheme was studied, but theoretical analysis of performance has not been studied well. Context sorting is a compression scheme based on context similarity and it is regarded as an online version of the block sorting and it is asymptotically optimal. However, the compression speed is slower and the real performan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Softw., Pract. Exper.
دوره 42 شماره
صفحات -
تاریخ انتشار 2012